freshwater fish population
Machine Learning Tools for Predicting Freshwater Fish Populations (ICRW7 Proceedings)
To address the lack of publicly available fish community data for most of US lotic freshwater habitats we develop scientific software modules and databases for predicting fish populations by NHDPlus (National Hydrography Dataset) ComId (Common Identifier) segment. We build predictive models of fish species presence in freshwater streams in CONUS using several customized Scikit-learn (Pedregosa and others 2011) machine learning pipelines. The dataset derives from EPA, USGS, and state agency records and contains 565 fish species observed through electrofishing in 28,519 stream segments identified by their NHDplus ComId sampling locations. We use the observations of fish to develop a binary dataset for each species, labeling as present(1) each species found at least once by electrofishing in sampled ComIds. Then for each species, we use the collection of HUC8's where that species may be found and we label the remaining sampled ComIds as absent(0).